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The Hadamard transform of pj |5] provides a way to work with stochastic models 
for sequence evolution without having to deal with the complications of tree space 
and the graphical structure of trees. Here we demonstrate that the transform can 
P_] be expressed in terms of the familiar P[r] = e^M formula for Markov chains. The 

CM 

key idea is to study the evolution of vectors of states, one vector entry for each taxa; 
we call this the n-taxon process. We derive transition probabilities for the process. 
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Significantly, the findings show that tree-based models are indeed in the family of 
(multi-variate) exponential distributions. 
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1 Introduction 



1.1 Stochastic models and the Hadamard transform 

Stochastic models for sequence evolution now play a part in most phylogenetic analyses. 
Given a tree, and a set of branch lengths, the models determine a probability distribution 
for the patterns of nucleotides / amino acids observed at a site in the alignment (the site 
patterns). The real difficulty of these tree-based models is that they are, indeed, based 
on a tree. The graphical structure of the tree is intrinsic to the probability formulae. 
Here, as in many other contexts, spaces of graphical structures are difficult to work with, 
both for the statistician and computer scientist. Indeed the state-of-the-art optimisation 
methods go little beyond local search techniques. 

The Hadamard transform [6j circumvents many of these issues. Each tree is encoded 
as a vector called a split vector or spectrum q. Hendy and Penny showed that there is 
a general formula taking a spectrum to the vector of site pattern probabilities for the 
tree: 

p = H- 1 exp(Hq). (1) 



The matrix H is a Hadamard matrix, see Section 3.2 This formula works for all trees 
and, once the tree has been coded as a vector, the tree structure plays no further part in 
the computation. Thus the Hadamard transform provides a model into which all tree- 
based models naturally nest. Refer to p~H [5] for excellent introductions to Hadamard 
transform methods. 

The transform is a useful tool for many theoretical investigations. However there are 
also important practical advantages of the approach. For example: 

1 . It provides a way of searching (or potentially, sampling) tree space that does not in- 
volve passing from individual tree to individual tree. One can invert the Hadamard 
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transform to construct a spectrum q from the observed pattern probabilities and 
then use q to infer a tree. 

2. Because the transform provides a model that generalises trees it can be used to 
test the hypothesis "does this data actually come from a tree?" An analysis of 
this sort does not need to be hypothesis driven: phylogenetic network software like 
SplitsTree [TT] and Spectronet [10] allow one to visualise a spectrum q and see 
where it violates tree based models. 

There have been many reformulations of the original Hadamard transform formula. Early 
proofs of the transform were based on an interpretation in terms of path sets 16] ; these 
were recently extended in |8] . The transforms were recast in terms of Fourier transforms 
on Abelian groups [H [12] 113 US]. Bryant [1] used this algebraic machinery to show 
that the transform can be understood in terms of evolutionary models on phylogenetic 
networks. Sturmfels and Sullivant [13] view the transform as a change of coordinates 
and, like [I], use it to study invariants on the phylogenetic tree models. 



There are two important limitations of the Hadamard transform. The first is the running 
time: a full Hadamard transform takes exponential time: current analyses are limited to 
a maximum of 30 taxa. This could be remedied using approximations (such as distance 
based methods like NeighborNet [3]) or Monte-Carlo strategies. 

The second limitation is the restriction on the substitution models with the Hadamard 
transform to group-based models. For nucleotide data, this means that one can only 



use subsets of the K3ST model (see Section 2.1). This restriction is probably the most 



important barrier to use of the transform. It was one of the motivations behind the 
reformulation of the transform outlined in this paper. The problem of how to remove 
this restriction is still open. 
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1.2 Contribution of this paper 



In this paper we derive a new formulation of the Hadamard transform. Our proof that 
the transform works does not use path sets or Fourier transforms, at least not explicitly. 
The transform is established using basic matrix analysis that does not go much beyond 
the tools the are routinely used in phylogenetic analysis. 

The key idea in the current paper is the n-taxon process. Consider a phylogenetic tree 
with times marked. At any particular time, every taxa has a unique ancestral lineage 
and this lineage has a unique state. Let vt denote the vector of ancestral states, so that 
vt[i] is the state of the ancestor of taxa i at time t. The n-taxon process is the continuous 
time Markov chain that describes the evolution of these vectors over time. 

We derive the transition probability matrix P[r] for this process for a given vector r of 
branch lengths. It is simply the exponential exp(Q[r]) of a linear combination of rate 
matrices for the branches. The formula for the Hadamard conjugation falls straight out 
of the formula for P[r], recast in appropriate notation. In fact the vector q as defined 
in [6] is simply the first column of the rate matrix Q[r]. 

The entry P[r] uv gives the probability that the final state is v (these are the observed 
states for each taxa) given that the initial vector of states is u. In the standard model 
for phylogenetic analysis that states for the initial vector would be all the same, corre- 
sponding to the fact that in a phylogeny the root is ancestral to all taxa. The same may 
not apply to problems from population genetics. 

In this paper we examine only two models, the binary symmetric model and the K3ST 
model. In practise, these are the only two models used with the Hadamard conjugation. 
The results here could be generalised to general group based models [H [161 0] • 
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2 From tree based models to the n-taxon process 



2.1 Tree based models 



We begin with a brief outline of the standard models used for sequence evolution; for 
more details and extensive references, see [2| \5[ H3]. 

As usual, we make the assumption that different sites in a sequence evolve independently 
from each other, so we can consider just the evolution of a single site. A state is drawn 
at the root from a fixed distribution it. The evolution of the site then proceeds along the 
branches from the root to the leaves of the tree. Along each branch, substitutions occur 
according to a continuous time Markov chain, the chain specified by its (instantaneous) 
rate matrix Q. 
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Figure 1: (a) An example of state evolution on a tree under the binary symmetric model. The 
horizontal axis is proportional to time. The state was drawn at the root, and substitutions 
occurred on edges a,b,c,e,g, giving the pattern 0101 at the leaves, (b) The corresponding n-tuple 
process. At each stage, the value of the process gives the ancestral states for each taxon. This 
changes five times from left to right. 



We consider only two choices for Q, corresponding to the binary symmetric model and 
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the K3ST model. They have respective rate matrices 
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(2) 

The values a, (3, 7 are positive reals chosen so that a+/3+7 = 1. These parameters control 
the rates of three types of substitution as represented in figure |2j Type I substitutions 
correspond to DNA transitions; types II and III are types of transversion. Different 
sub-models are obtained by making different rates equal to each other. 




G < n > T 

Figure 2: The three transition types under the K3ST model 



Both the binary symmetric model and the K3ST model have uniform stationary distri- 
butions. This is used for the distribution n at the root. 

Example 1 We illustrate the binary symmetric model on a four taxa tree in figure [I] 
(a). The root distribution is uniform: in this case was drawn (with probability 1/2). 
Substitutions occurred on edges a,b,c,e,g, giving the pattern 0101 at the leaves. 

Let Pij(t) denote the probability that the state at the end of a branch of length t is j 
given that the state at the beginning of the branch is %. These transition probabilities 
are given by the matrix exponential 



P(t) = exp(Qt) = I + Qt + Q 2 - + Q 3 - + 



(3) 
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The standard technique for computing this exponential (at least in phylogenetics) is to 
first diagonalise the matrix Q as 

Q = VAV- 1 (4) 
where A is a diagonal matrix, and then use the identity 



exp(Qt) = Fexp(At)^ 1 , 



(5) 



noting that exp(At) is a diagonal matrix with values exp(Ajjt) down the diagonal. 
Example 2 In the binary symmetric model, Q = 



Li ^ 



giving 



1 -1 



. We have 



Q 




P(t) 



1 _|_ 1 p 2t 1 1 a 2t 

2 T 2° 2 2° 

1 1 1 _|_ 1 p~ %t 

2 2 C 2 T 2 C 



(6) 



These probabilities are often presented with t scaled by a constant fx. Here we assume 
that time has been scaled so that fj, = 1. 



2.2 The n-taxa process 

We introduce an alternative way of describing the evolution of sequences along a tree. 
The main advantage of the approach is that the dependence on tree structure, and all 
of the complications that it introduces, can be encoded away. 

We will be working with a continuous time Markov chain on a much larger state space. 
Suppose that we have n taxa. Consider the set of all vectors assigning a state (e.g. a 
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binary value or a nucleotide) to every taxa. In the binary case there are 2 n of these 
vectors; in the 4-state case there are 4 n . The set of these vectors will be the state space 
of our process, however we will refer to these vectors as values of the process in order to 
minimise confusion with binary or nucleotide states. 

Consider again the tree based model. At a particular time t, let vj denote the vector of 
states of the ancestors for each of the n taxa. The n-taxa process is the continuous time 
Markov chain describing the evolution of these vectors. 

Example 3 In figure [7] the initial value of the process is that the ancestors of all taxa 
have state 0. Hence vj = [0,0,0,0]. Between time to and t± there is a substitution (on 
branch a). This occurs in the population that is ancestral to all of the taxa, so the effect 
is to change the value of the n-taxa process to = [1,1,1,1]. On branch b there is 
another substitution, though this is ancestral only to taxa 1 and 2. Thus, at time ti we 
have Vf 2 = [0,0, 1, 1]. The process continues until, at the present time (t^) the value of 
the process equals the observed states. 

In a (slight) abuse of terminology we will say that a taxon is a descendant of a branch if 
it is a descendant of the population/ancestors represented by that branch. For example 
in Figure [T] taxa 3 and 4 are the descendants of branch c. 

The n-taxon process is a continuous time Markov chain, but it is not time-homogeneous. 
The transition probabilities depend on which lineages at a particular time are ancestral 
to which taxa. In the example, the rates of substitution for the n-taxa process will 
change at the time points t±,t2, ■ ■ ■■ The process is homogeneous during the intervals 
between these time points. In what follows, we derive the rate matrices for the n-taxon 
process over each time interval. 

We make the following notational conventions to lessen confusion between the different 
rate matrices involved. 
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1. We use Q to denote the rate matrix for the underlying (binary symmetric or K3ST) 
substitution process (binary symmetric or K3ST). 

2. We use QW to denote the rate matrix for the n-taxon process during time interval 
R-l) U]- 

3. We use to denote the rate matrix for the n-taxon process restricted to sub- 
stitutions occurring on a given branch b. 

We make extensive use of the Kronecker product of matrices. Given an m x n matrix 
X and a p x q matrix Y the Kronecker product (or tensor product) of X and Y is the 
mp x nq matrix 

/ 



X®Y 



X U Y X 12 Y ••• X ln Y^ 
X 21 Y X 22 Y ••• X 2n Y 



(7) 



\^X m \Y • • • • • X mn Y J 
The elements of a Kronecker product can be indexed by vectors so, for example, 



{X (g> ^)[i,p],[j,q] = XijY pq , (X <g> Y <g> 2')[i I p lS ] I [j ig ,J] = XijY pq Z st . (8) 

See [9] for a detailed introduction to the Kronecker Transform. We will make use of the 
following properties 

Lemma 4 Let W, X, Y, Z be matrices with appropriate dimensions. 

1. (X®Y)®Z = X®(Y®Z). 

2. {X (g) Y){W ® Z) = XW YZ. 

3. Suppose that X, Y are non-zero. Then X ®Y is diagonal if and only if X and Y 
are diagonal. 



We print matrices formed from Kronecker products in boldface. 



3 Transition probabilities for the n-taxon process: binary 
symmetric case 



3.1 Substitutions down a single branch: rate matrix 

For the moment, consider only substitutions that occur along one particular branch b 
in the tree. We ignore substitutions occurring at the same time along other branches. 
Substitutions occur along branch b at rate 1. These substitutions affect only the taxa 
that are descendants of the branch b; let A be that set of taxa. A substitution along 
the branch corresponds to flipping the entries v$[i] for which i € A. These are the only 
substitutions in the n-taxon process restricted to the branch, and these substitutions 
occur at rate 1 . Thus the rate matrix R( b ) of this restricted process is given by 



±V U V 



1 if u[i) v[i] exactly when i G A; 
— 1 if u = v; 
otherwise. 



(9) 



Example 5 In the example in Figure 



( 



we have Q 



1 1 



. All elements of 



1 1 



are zero, except the diagonal elements (all —1) and the elements 



R 



(&) 



> R (6) 

,0,0,0], [1,1,0,0]' [0,0,0,1], [1,1 



R (6) 

0,1]' • • ' ' [1,1, 1,1], [0,0,1,1]' 



(10) 



R 



(6) 

[1,1,0,0], [0,0,0,0] 



R (6) 

' ,[1,1,0, 



,R 



(6) 



1], [0,0,0,1]' • • • ' "[0,0,1,1], [1,1,1,1]' 



(11) 



which all equal 1. 
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We re-express R( 6 ) in terms of the Kronecker product of simple 2x2 matrices. Define 

Lemma 6 Let A be the set of taxa that are descendants of the population represented by 
branch b. For each i = 1, 2, . . . , n set MM = E if i S A and = I otherwise. Then 

R (6) = M (i) M (2) ^ . . . q M (n) _ j ^ 

Example 7 Consider branch c in figure^ As A = {3,4} we have 

R (c) = /®/®£®£-I. (13) 

3.2 Substitutions down a single branch: transition probabilities 

In this section we show how to diagonalise the matrices . One of the attractions of 
the Kronecker product is that we can generally obtain a diagonalisation of the product 



matrix in terms of its factors. 



Let H := 



1 1 



1 -1 



and A = HEH' 1 



1 



. Thus HEH- 1 and HIH- 1 



-1 



are both diagonal. We use Kronecker product to construct a matrix that diagonalises 
RW. 

The n th order Hadamard matrix is defined 

H (n) =H®H®---®H, (14) 
an ra-fold Kronecker product. Note that (H^))" 1 = 2~ n H( n ). 

Lemma 8 Let H = H^ n ^, the n th order Hadamard matrix, and let R 6 be the rate matrix 
for the n-taxon process restricted to branch b, binary symmetric case. Let A be the set 
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of taxa that are descendants of branch b. Then 



:= HR<% 



(15) 



is a diagonal matrix, with 



= (-i)l{^-M=i}l _ i 



(16) 



for all state vectors u. 
Proof 

Define matrices as in Lemma rol Then 



H(RW + I)H 1 = (HM^H- 1 ) ® ■ ■ • <g> (HM^H- 1 ). 



(17) 



Hie A then HM^H^ 1 = HEH^ 1 = A while if i g A we have HM^H~ l = I. The 
Kronecker product of diagonal matrices is diagonal, so A is diagonal. 

For the diagonal values, note that 



i=i 



and that 



— 1 if i 6 -A and -u[i] = 1 ; 
1 otherwise. 



(18) 



(19) 



□ 



The transition probabilities down branch b now follow directly from the diagonalisation, 
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since 

exp(R^) = H 1 exp(A( 6 ))H (20) 
and exp(A( 6 )) is a diagonal matrix with entries exp(Auu) down the diagonal. 

3.3 Transition probabilities over multiple lineages 

During the intervals between time points there will be, in general, several lineages evolv- 
ing independently. Because of this independence, the rate matrix QW for the substitu- 
tion process over all lineages is simply the sum of the rate matrices for the individual 
branches present at that time point. 

Example 9 In figure [7] the rate matrix for the n-taxon process between t2 and t$ equals 

q(3) = R (d) + R (e) + R ( C ) pi) 

as branches d, e, c are present during this interval. 

Between time points to an d the present time t^ we therefore have a sequence of rate 
matrices Q^ 2 **, ■ • • , Q( fc ). Each rate matrix QW equals the sum of the rate matrices 
R,( fe ) for all branches b present during the interval During each interval, the 

probability transitions are given by the standard exponential formula 

P«=exp(Q«(t l -U-ij) (22) 

so the transition probabilities between time to & n d time tk are given by 

P = p(l)p(2) ...p(*) 

= e QW(ti-to) e Q (2 H*2-ti)... e QW(tfc-^-i) (23) 
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Now we make a critical simplification. The rate matrices R/ 6 ) down each branch are aU 
diagonalised by the Hadamard matrix H. Hence so are the sums QW. Since every matrix 
in the product (23) is diagonalised by the same matrix H, the rate matrices 



all commute. If two matrices X and Y commute then exp(X) exp(Y) = exp(X + Y). 



Applying this identity to (|23|) gives 

/ k \ 
" 1 (24) 



exp (j^Q^iU-ti-!)) 



Now examine the sum Yli=i Q®(*i — U—l)> a linear combination of the individual branch 
rate matrices H^ b \ The coemcient of each matrix R^) is equal to the total length of 
time that the branch is present: the length of the branch. Let r denote the vector of 
branch lengths. We have now established the following theorem. 

Theorem 10 Let P[r] be the matrix of transition probabilities in the n-taxon process 
for the binary symmetric case given a branch length vector r. Define 

QM = ^R ( \ (25) 
b 

where b ranges over branches in the tree, R^*) is the matrix given in Lemma [fij and t& 
is the length of branch b. Then HQ[r]H _1 is a diagonal matrix and 

P[r] = exp(Q[r]) (26) 



The probability distribution for a tree can be recovered from (26) by noting that, at 
the root, the process is = [0, 0, ... , 0]' with probability ttq = 1/2 and 1 = [1, 1, . . . , 1]' 
with probability 1/2. If u is the pattern of states at the leaves, then the probability of 
observing u equals 

P = ^Pou[r] + ip lu [r]. (27) 
14 



Interestingly, (26) also applies to the case when there is not a single common ancestor for 
the taxa, a feature that may well prove useful in population genetics applications. 



3.4 Recovering the Hadamard formula 

The Hadamard conjugation formula [6, T3J assumes that one taxon has all zero states and 
gives the probabilities for patterns on the remaining taxa. We can retrieve the formula 



almost directly from Theorem 10 giving a new proof for the Hadamard conjugation. 
This new derivation explains why the zero entry of the vector q in [6] is chosen to make 
the sum of all entries zero: the vector q is simply a row out of the rate matrix Q. 

Theorem 11 [6] Suppose that the tree has taxa at the root with state 0. For each non- 
zero vector u (indexed by the remaining taxa) let q u be the length of the branch with 
descendants {i : u[i] = 1}, if there is such a branch in the tree, and zero otherwise. 
Let qo be the negative of the sum of all the branch lengths in the tree. Let p u be the 
probability of observing the pattern u at the leaves. Then 

p = ET 1 exp(Hq). (28) 



Here the exponential is entry-wise. 
Proof 

Let denote the vector [0,0, . . . ,0]'. We seek the probabilities p u = PouM- As P[r] is 
symmetric, P 0u [t] = P u0 [r]. 

The vector q is the 0-column of Q[r] Let A = HQ[r]H _1 , so that HQ = AH. The 
0-column of H is all ones, so the 0-column of AH is made up of the diagonal entries of 
A. Hence the entries in Hq are the entries along the diagonal of A. Taking entry-wise 
exponentials, we have that exp(Hq) equals the entries along the diagonal of exp(A) and 
so exp(Hq) is the first column of exp(A)H. The formula now follows from that fact that 
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PuoM is the O-column of 

P[t] = He A H 1 

= 2-( n " 1 )He A H 
= H 1 e A H. 

□ 



4 Transition probabilities for the n-taxon process: K3ST 
model 

We now extend the results of the previous sections to the K3ST model. In the interests 
of brevity we only outline the key steps in the derivation. 

4.1 Substitutions down a single branch: rate matrix 

Under the K3ST model there are three types of substitution. Instead of defining one 
matrix E as above, we define three matrices 



Ei 



10 
10 
1 
^0 10/ 



En 



10 

1 

10 

\o i o oy 



Em 



1^ 

10 

10 

1 o o oy 



, (29) 



so that Q = aEi + (3E H + 7 £ m - (a + (3 + 7)/. 



Lemma 12 Let A be the set of taxa that are descendants of the population represented 
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by branch b. For each i = 1,2, ... ,n set Mj = Ej if i G A and AfW = / otherwise. 

(i) (i) 

Likewise for M j / and Mj/j. Then 



(i) 



Mj"J - I. (30) 



The matrix is indexed by vectors of states. We number the states 0, 1, 2, 3 corre- 
sponding to A, C, G, T respectively. 

4.2 Substitutions down a single branch: transition probabilities 

We use the same trick as before to diagonalise the rate matrix R,( b ) in the K3ST case: 
using the properties of the Kronecker product. Let 

(l 1 1 l\ 

I- 1 1 -1 

II- 1-1 
1-1-1 1 



H = 



(31) 



then define A/ = H~ 1 E I H, Ajj = H~ x E n H and A IH = H^EmH. Then A/ = 
diag(l, —1, 1, —1), Ajj = diag(l, 1, —1, —1) and Ani = diag(l, —1, —1, 1). 



For this case, we let H denote the ra-fold product H (g) H < 
the 2n th order Hadamard matrix. 



H, which is equal to 



Lemma 13 Let H = H^ 2 ™), the 2n th order Hadamard matrix, and let R 6 be the rate 
matrix for the n-taxon process restricted to branch b, K3ST case. Let A be the set of 
taxa that are descendants of branch b. Then 



:= HR^LT 



(32) 
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is a diagonal matrix, with 



A ( ul = a(-l) l{ieA:u[il=1 or 3}l + /3(-l)K ieAu M=2 or 3 >l + 7 (_i)K i eAu[i]=l or 2}| _ 1 

(33) 

for all state vectors u. 

The transition probabilities down branch b now follow directly from the diagonalisation, 
since 

exp(R^) = H 1 exp(AW)H (34) 
and exp(A( 6 )) is a diagonal matrix with entries exp(Auu) down the diagonal. 

4.3 Transition probabilities over multiple lineages 

The progression from rate matrices for branches to rate matrices for the entire n-taxon 
process is almost identical in the K3ST case as in the binary symmetric model case. 
During each time interval [tj_i,ij] the rate matrix QW for the ra-taxon process is the 
sum of the rate matrices for branches present at that time. They are all diagonalised 
by the 2n th order Hadamard matrix H, so commute, and we have 



(35) 



P=e X p|^Q«(i 1 -t i _ 1 )j . 

Furthermore, given the vector r of branch lengths we have 

k 

^2 q ( % -ti-o =J2 R(6)r <» ( 36 ) 



establishing the following analogue to Theorem 10 



Theorem 14 Let P[r] be the matrix of transition probabilities in the n-taxon process 
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for the binary symmetric case given a branch length vector r. Define 



QM = E R(b) 



n 



(37) 



b 



where b ranges over branches in the tree, Fi/^ is the matrix given in Lemma 12, and r& 



is the length of branch b. Then HQ[r]H 1 is a diagonal matrix and 



P[r]=exp(Q[r]) 



(38) 
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